Introduction

Overview
Wars are complex events born of geopolitical, cultural or economic strife, oftentimes spanning many years but ultimately costing us lives, our livelihood and peace. During wars, countries quickly adopt ideologies, form allegiances, and discipline their economic and scientific priorities while maintaining their military focus with a blind adherence. Although the causes of this displacement of peace may vary, is there a precursory pattern to it? Does the landscape change after the end of a prolonged conflict? Do certain actors benefit more? Do some lose more than others? And most importantly, could there be important predictors of these epic events that change the course of history?

Our Aim
We are particularly interested in studying the changes that happen to a country before and after they enter a war. We want to see the change of alliances and strategies, it’s impact on trade and commerce and the economics at play. We also want to compare and contrast the characteristics of countries who won wars with the ones that lost. Our eventual goal is to find certain factors that indicate which countries will enter into a war and how these factors/predictors change over time.

Scope and Timeline
To limit our scope we will explore the data with a particular emphasis on the United States of America and the wars it has fought since 1900. At various points, we may have to include comparisons between countries and the US and we will explore the data breadth-wise to draw meaningful insights.

For our timeline, we plan to look at events/activity leading to, during and following the major US wars, namely;
WWI ————————– 1914-1918
WWII ————————- 1939-1945
Cold War ——————– 1947-1991
Korean War —————- 1950-1953
Vietnam War ————— 1955-1975
War in Afghanistan ——- 2001-2010

The Data, Team Members and The Roles

The Correlates of War Project is a treasure trove of information. We have a special interest in the following datasets: Trade, National Materials Capabilities, Alliances and Militarized Interstate Disputes.

Our Plan
We’ve decided to divide and conquer the work by each taking a subset of the data and exploring it. After sometime, we will regroup to see what we’ve learnt so far and switch data sets amongst ourselves to see if there are more insights to be learnt or different approaches to visualize the existing data. Lastly, we want to drill down into particular variables and plot correlations or predictors for the final output.

Phase 1:
* Cynthia to analyze National Materials Capabilities and Alliances
* Vineet to analyze Militarized Interstate Disputes and Trade

Phase 2:
* We are going to switch the data sets we are looking at to see if the other person can discern any new insights or creative ways of presenting the data.
+ Cynthia to analyze Militarized Interstate Disputes and Trade
+ Vineet to analyze National Materials Capabilities and Alliances

Phase 3: * Cynthia and Vineet to come together and look at the interaction of different variables. For example how did a change in trade impact the NMC.

Analysis of Data Quality

Provide a detailed, well-organized description of data quality, including textual description, graphs, and code.

The Correlates of War datasets are a product of the Correlate of War Project (COW) founded in 1963. COW’s goal is to “facilitate the collection, dissemination, and use of accurate and reliable quantitative data in international relations.” [add source] From the COW datasets we focused on 4 datasets: NMC, MID, Alliances and Trade.

Overall

in this section we can talk about Consistency, conformity and integrity

Overall the consistency of these data sets is quite good. We have not found any evidence of conflicting information.

for each individual data set address accuracy, completeness, and dulpication

National Materials Capabilities

The overall data quality of NMC dataset is very good. There are roughly 14,000 entries and 89% of them did not have any missing values. There is one data entry for each country per year. The accuracy of the data is also very good because as countries are dissolved and new ones are formed, this data keeps track of them. For example, the graph depicts the CINC of Austria-Hungary from 1900-1918, the end of WWI when it the Austro-Hungarian Empire was dissolved. Immediately after that, you see data points for Austria and Hungary separately.This same accuracy hold true for many different coutries, where there is only data once the country has declared independence or has just been created.

NMC_test <- NMC_orig
NMC_test$cinc[NMC_test$cinc == -9] <- NA
NMC_test$irst[NMC_test$irst == -9] <- NA
NMC_test$milex[NMC_test$milex == -9] <- NA
NMC_test$milper[NMC_test$milper== -9] <- NA
NMC_test$pec[NMC_test$pec == -9] <- NA
NMC_test$tpop[NMC_test$tpop == -9] <- NA
NMC_test$upop[NMC_test$upop == -9] <- NA
NMC_test <- filter(NMC_test, NMC_test$year %in% c(1900:2007))
test <- c("Austria-Hungary", "Austria", "Hungary")
test_ccode <- member_alliances$ccode[match(test, member_alliances$state_name)]
NMC_test <- filter(NMC_test, NMC_test$ccode %in% test_ccode)
ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="CINC") +
  labs(color ='Country Abbreviation')+
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=.09, fill=Conflict),alpha=0.15) +
  geom_line(data = NMC_test, aes(x = year, y = cinc, color = stateabb, group = stateabb)) +
  ggtitle("CINC for Austria-Hungary") + 
  theme_classic()+
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5), legend.position = "bottom") 

Alliances

The overall data quality of the data set was very good. The information in the alliances data matched exactly with historical facts

Graph of NATO Alliance and when the members joined Members

version id == 227

Some of the inconsistencies that I noticed are in cases where the alliance is still in effect as of the 12/31/2012, which was when this data set was last updated. In some of the datasets, if the alliance is ongoing, it would have the dyad_end_year, the field that represents the year in which the alliance was terminated, set to 2012 and in other data sets it would have it set as ‘NA’. In the case that dyad_end_year is set to 2012, it was hard to know if there were any alliances that ended in 2012 or if they were an ongoing alliance.

  1. US had only one alliance before 1925 and Japan has no more alliances now
qa <- filter(dir_al_year, dir_al_year$state_name1 %in% c("United States of America"))
qa <- qa[, c(3,5,8,11,18)]
qa$dyad_end_year[qa$dyad_end_year %in% NA] = 2016
#qa <- gather(qa, st_ed, year, dyad_st_year:dyad_end_year )
ggplot() + 
  geom_point(data =qa, aes(x=year, y = state_name2), alpha = .5) +
  xlab("Year") +
  ylab("Country")+
  ggtitle("")+
  theme_classic() 

Trade

Militarized Interstate Disputes

Below for reference only - this is what we want to talk to in this section

Completeness: Is all the requisite information available? Are data values missing, or in an unusable state? In some cases, missing data is irrelevant, but when the information that is missing is critical to a specific business process, completeness becomes an issue.

Conformity: Are there expectations that data values conform to specified formats? If so, do all the values conform to those formats? Maintaining conformance to specific formats is important in data representation, presentation, aggregate reporting, search, and establishing key relationships.

Consistency: Do distinct data instances provide conflicting information about the same underlying data object? Are values consistent across data sets? Do interdependent attributes always appropriately reflect their expected consistency? Inconsistency between data values plagues organizations attempting to reconcile between different systems and applications.

Accuracy: Do data objects accurately represent the “real-world” values they are expected to model? Incorrect spellings of product or person names, addresses, and even untimely or not current data can impact operational and analytical applications.

Duplication: Are there multiple, unnecessary representations of the same data objects within your data set? The inability to maintain a single representation for each entity across your systems poses numerous vulnerabilities and risks.

Integrity: What data is missing important relationship linkages? The inability to link related records together may actually introduce duplication across your systems. Not only that, as more value is derived from analyzing connectivity and relationships, the inability to link related data instance together impedes this valuable analysis.

Based on the data how strong are your observations

Executive Summary

Provide a short nontechnical summary of the most revealing findings of your analysis with no more than 3 static graphs or one interactive graph (or link), written for a nontechnical audience. The length should be approximately 2 pages (if we were using pages…) Do not show code, and take extra care to clean up your graphs, ensuring that best practices for presentation are followed.

National Materials Capabilities

Alliances

Trade

Militarized Interstate Disputes

Main Analysis

Provide a detailed, well-organized description of your findings, including textual description, graphs, and code. Your focus should be on both the results and the process. Include, as reasonable and relevant, approaches that didn’t work, challenges, the data cleaning process, etc.

National Materials Capabilities

NMC <- NMC_orig 
NMC$cinc[NMC$cinc == -9| is.na(NMC$cinc)] <- 0
NMC$irst[NMC$irst == -9| is.na(NMC$irst)] <- 0
NMC$milex[NMC$milex == -9| is.na(NMC$milex)] <- 0
NMC$milper[NMC$milper== -9| is.na(NMC$milper)] <- 0
NMC$pec[NMC$pec == -9| is.na(NMC$pec)] <- 0
NMC$tpop[NMC$tpop == -9| is.na(NMC$tpop)] <- 0
NMC$upop[NMC$upop == -9| is.na(NMC$upop)] <- 0
all_year <- c(1900:2007)
NMC_ratios <- c("")
for (year_t in all_year){
  yr <- filter(NMC, NMC$year %in% year_t) 
  max <- apply(yr[, c(4:9)], 2, sum)
  #max <- as.numeric(max[4:9])
  for (i in 4:9){
    yr[,i] = as.numeric(yr[,i]/max[i-3])
  }
  NMC_ratios <- smartbind(NMC_ratios , yr)
}
NMC_ratios  <- NMC_ratios [c(2:nrow(NMC_ratios )), c(2:length(NMC_ratios))]
cbbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")

National Materials Capability measures the power of a country based on 6 values: total population, urban population, military personnel, military expenditures, iron and steel production and energy consumption. NMC is purely a measure of military and economic means of influence rather than diplomacy or other forms of influence.

CINC is the composite score to measure the power of a country using the average of the ratios, calculated as described below.

[] (./cinc_calc.png)

Below is the CNIC score for major powers today who also participated the major wars in the past. Also highlighted are the years wars mentioned above.

```{r fig.width=5, fig.height=2.5}

countries <- c("United States of America", "United Kingdom", "France", "Russia","Germany", "Japan", "China")
country_code <- member_alliances$ccode[match(countries, member_alliances$state_name)]
NMC_mp <- filter(NMC, NMC$year %in% all_year)
NMC_mp <- filter(NMC_mp, NMC_mp$ccode %in% country_code)
NMC_mp$country = ""
for( i in c(1:length(country_code))){
  NMC_mp$country[NMC_mp$ccode == country_code[i]] <- countries[i]
}
ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="CINC") +
  labs(color ='Country')+
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp$cinc), fill=Conflict),alpha=0.15) +
  geom_line(data = NMC_mp, aes(x = year, y = cinc, color = country, group = stateabb)) +
  ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic()+
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5), legend.position = "bottom") 

NA

As you can see CNIC is constantly changing. Interestingly, the CNIC spikes for the US right after WWI and WWII where the US had a major role in the outcome of the wars. There is also an increase in the US’s CINC score during the Korean War. Following the Korean War, the US’s CINC score shows a continuous decrease during the Vietnam where the US lost the war. At the end of WWI, you see Russia’s CNIC dip low but it bounced back to its CNIC before WWI pretty quickly. During the Vietnam War, Russia supported the Vietnamese people and you see the its CINC score increase above that of the US as they start to gain ground in the war. Also Russia’s CINC score drops low towards the end of the Cold War as the satellite states start gaining their independence and the USSR was dissolved.

To explore the trends more, the data was refined to only look at the major participants of each war. For example, with WWI we looked at the CINC values for major players in the Allied Powers and Central Powers a few years before and after the war. We replicated this process for all the events listed above. Below are CINC graphs using this approach for WWI and WWII.

allied <- c("United States of America", "United Kingdom", "Russia", "Japan", "Italy")
allied_ccode <- member_alliances$ccode[match(allied, member_alliances$state_name)]
central <- c("Germany", "Turkey", "Austria-Hungary", "Romania", "Bulgaria")
central_ccode <- member_alliances$ccode[match(central, member_alliances$state_name)]
WWI_range = c(1904:1930)
WWI<- filter(NMC, NMC$year %in% WWI_range)
alliedP<- filter(WWI, WWI$ccode %in% allied_ccode)
alliedP$side = "Allied Powers"
for( i in c(1:length(allied_ccode))){
  alliedP$country[alliedP$ccode == allied_ccode[i]] <- paste(allied[i],"(Allied)", sep = " ")
}
centralP<- filter(WWI, WWI$ccode %in% central_ccode)
centralP$side = "Central Powers "
for( i in c(1:length(central_ccode))){
  centralP$country[centralP$ccode == central_ccode[i]] <- paste(central[i], "(Central)", sep = " ")
}
WWI <-rbind(alliedP, centralP)
ww1 <- ggplot() + 
  labs(color ='Country')+
  xlab("Year") +
  ylab("CNIC")+
  geom_rect(data=d, mapping=aes(xmin=1914, xmax=1918, ymin=0, ymax=.4),alpha=0.05, fill ="salmon") +
  geom_line(data = WWI, aes(x = year, y = cinc, color = country, group = country)) + 
  facet_wrap(~side) + 
  theme_classic()+
  scale_color_brewer(palette="Paired")+
  ggtitle("CNIC Score: WWI Major Players ")+
   theme(plot.title = element_text(hjust = .5),legend.position="right")
allies <- c("United States of America", "United Kingdom", "France", "Russia", "Australia","China")
allies_ccode <- member_alliances$ccode[match(allies, member_alliances$state_name)]
axis <- c("Germany", "Italy", "Japan", "Hungary", "Romania", "Bulgaria")
axis_ccode <- member_alliances$ccode[match(axis, member_alliances$state_name)]
WWII_range = c(1934:1950)
WWII<- filter(NMC, NMC$year %in% WWII_range)
alliedP2<- filter(WWII, WWII$ccode %in% allies_ccode)
alliedP2$side = "Allies"
for( i in c(1:length(allies_ccode))){
  alliedP2$country[alliedP2$ccode == allies_ccode[i]] <- paste(allies[i], "(Allies)", sep = " ")
}
axisP<- filter(WWII, WWII$ccode %in% axis_ccode)
axisP$side = "Axis"
for( i in c(1:length(axis_ccode))){
  axisP$country[axisP$ccode == axis_ccode[i]] <- paste(axis[i], "(Axis)", sep = " ")
}
WWII <-rbind(alliedP2, axisP)
ww2 <- ggplot() + 
  labs(color ='Country')+
  xlab("Year") +
  ylab("CNIC")+
  geom_rect(data=d, mapping=aes(xmin=1939, xmax=1945, ymin=0, ymax=.4),alpha=0.05, fill ="paleturquoise3") +
  geom_line(data = WWII, aes(x = year, y = cinc, color = country, group = stateabb)) + 
  facet_wrap(~side) + 
  theme_classic()+
  scale_color_brewer(palette="Paired")+
  ggtitle("CNIC Score: WWII Major Players ")+
  theme(plot.title = element_text(hjust = .5),legend.position="right")

```{r fig.width=5, fig.height=4}

grid.arrange(ww1, ww2, nrow=2)

As mentioned, with WWI the US’s CINC spiked right after the war and then began to steadily decrease till the beginning of the WWII. Russia’s CINC dropped but rose again pretty quickly and stayed on a relatively upward trend till WWII. Unlike the US and Russia, United Kingdom’s CINC was steadily decreasing after the war. Italy and Japan’s CINC remained steady. With the Central Powers after WWI, Germany’s CINC dropped but it did not rise again. Turkey’s, Romania’s and Bulgaria’s CINC remained steady. We see the Austria-Hungarian CINC disappear after the war since the Austro-Hungarian empire was dissolved at the end of the war.

With WWII, we see a similar pattern for the US where the CNIC reaches a peak at the end of WWII and steadily decreases till the Korean war. Russia also shows a similar pattern to WWI where its CINC score reaches a low point towards the end of WWII and then continues to steadily increase till the Korean war. The UK also follows a similar pattern where it CINC peaks right after the war and then steadily decreases throughout the Cold War time period. With the Axis powers, Germany’s and Japan’s CINC drops off.

To explore the patters above, we looked into the components that make up the CINC. We chose to focus on the major powers because they had the most drastic changes during this time period. We looked at both the actual value and the ratio because the absolute values gradually increased over time but the ratios show performance relative to the other countries each year. Looking at the ratios helped us to see trends that were not easy to spot when looking at the overall values. Below is a plot of five of the six CINC components, the values and the ratios

```{r fig.width=8, fig.height=7 }

grid.arrange(arrangeGrob(ias, ias_r, mex, mex_r, mip, mip_r + theme(plot.title = element_text(hjust = .5), legend.position="none"),nrow=3, ncol =2),mylegend, heights=c(10,1))

#grid.arrange(ias, ias_r, mex, mex_r, mip, mip_r, urp, urp_r, top, top_r, nrow=6)

With the Iron and Steel Production ratio, you can see that it follows to same pattern as the CINC for the US during this time period. The ratio peaks towards the end of WWI and WWII and decreases in the time period between the two wars. Till about the beginning of the Vietnam War, roughly 1955, the US dominated the world in Iron and Steel production and so this value had huge impact the US’s CINC score. Generally, during most wars, the US has the most military production but lost its position toward s the end of the Vietnam war when Russia surpassed the US. The United Kingdom maintained its iron and steel production, but since USA and Russia were increasing their production, the UKs ratio has been steadily decreasing since the 1900s, very similar to the pattern observed with the CINC score.

Looking at military expenditure we see that the US and Russia had been significantly investing more in the military throughout the cold war. On the other hand, both countries decrease their military personnel after they peaked at the end of WWII. These findings are consistent with the Cold War where the US and Russia were in an arms race where they heavily invested in military technology but did not engage in any large scale battles. The US’s military expenditures ratio peaks the same year its CINC score and iron and steel production ratio.

In the few years before WWII, you see Germany’s military expenditures ratio increase quite rapidly and the military personnel ratio saw a drastic increase in the one year before increasing quite rapidly. Although not as drastic, Japan and China follow a similar pattern where the military investment increased significantly a few years prior to WWII and the Korean War, respectively, and the military personnel ratio drastically increased right before the wars. This suggests that in years before the wars, these countries started investing in and preparing their militaries for war. The countries that were on the reactive side, the US, Russia and the UK, their military production ratios and military expenditures ratios only increased during the war.

```{r fig.width=3, fig.height=1}

NMC_v<- filter(NMC_ratios, NMC_ratios$year %in% c(1970:2000))
NMC_ratios_v<- filter(NMC_ratios, NMC_ratios$year %in% c(1970:2000))
#names <- c("North Korea", "South Korea", "Afghanistan"  , "Vietnam", "Republic of Vietnam")
#code<- c(731, 732,700, 816, 817)
names <- c("Iraq")
code <- c(645)
NMC_v<- filter(NMC_v, NMC_v$ccode %in% code)
NMC_ratios_v<- filter(NMC_ratios_v, NMC_ratios_v$ccode %in% code)
for( i in c(1:length(code))){
  NMC_ratios_v$country[NMC_ratios_v$ccode == code[i] ]<- names[i]
  NMC_v$country[NMC_v$ccode == code[i] ]<- names[i]
}
a<- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Ratio") + 
  geom_line(data = NMC_ratios_v, aes(x = year, y = milex), colour = "indianred3") +
  ggtitle("Iraq Military Expenditures Ratio") + 
  geom_rect(data=d, mapping=aes(xmin=1990, xmax=1991, ymin=0, ymax=.025),alpha=0.03) +
  theme_classic() +
  theme(plot.title = element_text(hjust = .5, size =10 ),   axis.text.x =element_text(size =6), axis.text.y =element_text(size =6), axis.title.x =element_text(size =8), axis.title.y =element_text(size =8), legend.position="none")
b<- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Ratio") +
  geom_line(data = NMC_ratios_v, aes(x = year, y = milper), colour= "royalblue3") +
  ggtitle("Iraq Military Personnel Ratio") + 
  geom_rect(data=d, mapping=aes(xmin=1990, xmax=1991, ymin=0, ymax=.05), alpha=0.03) +
  theme_classic() +
  theme(plot.title = element_text(hjust = .5, size =10 ),   axis.text.x =element_text(size =6), axis.text.y =element_text(size =6), axis.title.x =element_text(size =8), axis.title.y =element_text(size =8), legend.position="none")
grid.arrange(a,b,nrow = 1)

We wanted to see if this this same pattern was present in other conflicts. During the Gulf War, Iraq invaded Kuwait and it was met with international condemnation and the US and other nations joined forces to stop Iraq. But before the invasion in war we see Iraq’s military expenditures and personnel ratios increasing. The gray shaded box indicates the time period of the Gulf War (1990-1991). In the 1980s, Iraq’s military expenditure ratio drastically increased and then there was a sudden spike in military personnel right before the start of the war.

```{r fig.width=7.5, fig.height= 3}

ct <- c("United States of America", "United Kingdom", "France", "Russia","Germany", "Italy", "Japan", "China")
ct_ccode <- member_alliances$ccode[match(ct, member_alliances$state_name)]
NMC_ct <- filter(NMC, NMC$year %in% all_year)
NMC_ct <- filter(NMC_ct, NMC_ct$ccode %in% ct_ccode)
for( i in c(1:length(ct_ccode))){
  NMC_ct$country[NMC_ct$ccode == ct_ccode[i]] <- ct[i]
}
a<- ggplot(NMC_ct, aes(country, year, fill = cinc)) + geom_tile()+
  xlab("Country") +
  ylab("Year") +
  scale_fill_viridis() + 
  ggtitle("CINC Heatmap for Major Powers") + 
  theme_classic()+
  theme(plot.title = element_text(hjust = 0.5))+ 
  coord_flip() +
  labs(color ='CINC')
topCas <- c("Netherlands", "Yugoslavia", "Lithuania", "Poland", "Austria", "Hungary", "Romania", "Estonia", "Luxembourg")
topCas_ccode <- member_alliances$ccode[match(topCas, member_alliances$state_name)]
NMC_cas <- filter(NMC, NMC$year %in% all_year)
NMC_cas <- filter(NMC_cas, NMC_cas$ccode %in% topCas_ccode)
for( i in c(1:length(topCas_ccode))){
  NMC_cas$country[NMC_cas$ccode == topCas_ccode[i]] <- topCas[i]
}
b<- ggplot(NMC_cas, aes(country, year, fill = cinc)) + geom_tile()+
  xlab("Country") +
  ylab("Year") +
  scale_fill_viridis() + 
  ggtitle("CINC Heatmap for Countries with the most Holocaust Casualties") + 
  theme_classic()+
  theme(plot.title = element_text(hjust = 0.5)) +
  coord_flip() +
  labs(color ='CINC')
  
grid.arrange(a,b, nrow = 1 )

Finally, using the heat maps, we looked at impact on CINC after a war. On the left is a heat map of the major powers. You can see that France, Germany and Japan had missing CINC values at the end of World War II. Those periods of missing values correspond to the recovery period for each of those countries following the war. Next, we looked to see what other coutries had gaps in their CINC data. Most of the countries that have missing gaps are Eastern European countries that where heaviliy impacted by the Holocaust and by the struggle to end Communism in the region. This indiactes that for countries that are going through intense destruction or reformation, they do not have any CINC information.

Next Steps To get even more into detail and context around certain characteristics, we want to look at specific events like pearl harbor or China’s invasion of Malaysia and other pivotal moments in wars to see how those events impact the CINC and its various components.

A problem with NMC is that there are many other factors that determine a power of a nation rather than the 6 NMC factors. One of the major considerations that is not taken into account are diplomatic relations and dealings. Diplomatic relations play a major role is the prevention and conclusion of conflicts. With this data, it was not possible to factor that in.

Additionally, another thing to consider is differences in policies between different countries. We see that military expenditures have been increasing for the US since the Cold War but Russia’s military expenditures take a sudden drop at the end of the Cold War. Since then Russia has been cutting military spending till today. Even with its participation in the Afghanistan War, Russia’s military expenditures have not increase. On the contrary, in the US today, politicians are proposing a Federal Budgets with increases in military spending. This difference is due to differences in policies of the countries. Thus the reactions of countries to events will drastically vary based on their policies and it because hard to distinguish an overall pattern.

Another drawback of NMC is that it cannot take into account changes in universal priorities. For example, with an increased concern for climate change and scare natural resources and also with advancements in technology, iron and steel production might start to decrease drastically in the future so it may no longer be a valid measure of power. Similarly, advancements in technoloy would decrease the need for military personeel. The issue with NMC is that it cannot take such policy concerns and changes into consideration to measure national power.

Alliances

dayd_al_year <- filter(dyad_al_year, dyad_al_year$year %in% c(1900:2012))
dyad_al_year$length = dyad_al_year$dyad_end_year - dyad_al_year$dyad_st_year
dayd_al_year$conflict <- "0"
dyad_al_year$count <- 1 
dir_alliances <- gather(dir_al_year, treaty_type, idicator, defense:entente)
dir_alliances <- dir_alliances[!dir_alliances$idicator %in%  0,]
dir_alliances$dyad_end_year[dir_alliances$dyad_end_year %in% NA] = 2016
dir_alliances <- dir_alliances[dir_alliances$year>1900,]
alliance_count <- dir_alliances[, c(2,3,14,16)]
alliance_count$count <- 1 
gp_ct <- aggregate(cbind(count) ~ ccode1+state_name1+year+treaty_type, data =alliance_count, FUN = sum )
gp_ct$Conflict <- "0"
for(i in c(1:length(d$x1))){
  gp_ct[gp_ct$year >= d$x1[i] & gp_ct$year <= d$x2[i], length(gp_ct)] <- as.character(i)
}

WWI was triggered by the assassination of the Archduke Franz Ferdinand of Austria. His death set off diplomatic crisis as countries that were not involved in the original conflict were forced to get involved. Once Austria declared war on Serbia for the death of the Arch Duke, Russia had to step into defend Serbia. Once Russia entered the conflict, Germany was forced to enter the conflict due to its alliance with Austria. During the conflict Germany invaded Belgium; in response, the United Kingdom mobilized due to their alliance with Belgium. This pattern continued to eventually involve all the major powers of the world for a devastating battle. Such alliances were the cause of World War I. Since then the number of Alliances has only grown and continues to grow. For this reason, we wanted to look at alliances and see how they change during wars.

Below is boxplot of the total number of alliances that are in effect each year between any two countries. It is easy to see that the median number of alliances jumped up significantly during WWII and continued to grow during the Cold War and remained relatively level since then. An interesting pattern is that the median number of alliances increased more in the 1-3 before the end of war. You can see this patter with WWI, Korean War, Vietnam War and the end of the Cold War. Although the Cold War was only a state of severe political war there were many regional battles and the threat of a large-scale military war was constant. The number of alliances significantly increased from the start of the Cold War till the end.

```{r fig.width=10, fig.height=5}

ggplot() +
  xlab("Year") +
  ylab("Count")+
  geom_boxplot(data = gp_ct, aes(x = as.factor(year), y = count, fill = Conflict)) +
  ggtitle("Total Alliances by Year") + 
  theme_classic()+
  theme(plot.title = element_text(hjust = .5, size = 20), axis.text.x = element_text(angle = 90, size = 10),  axis.text.y = element_text(size = 15),  axis.title.y = element_text(size = 15), axis.title.x = element_text(size = 15),  legend.position="bottom", legend.text = element_text(size=15),  legend.title = element_text(size=15))+
  scale_fill_manual(values=c("white", "lightsteelblue3", "pink3", "paleturquoise3", "lightsteelblue2", "lightsteelblue4","salmon" ), labels = mylables)

Next we looked at the types of alliances formed during this time. The COW data reports on 4 types of alliances: defense, neutrality, entente and non-aggression. In a defense alliance, the member states agree to defend one or more states in alliance in the event of a conflict. With a neutrality alliance, there is an agreement to maintain neutrality towards the members of the alliance. In non-aggression alliance, the members agree to take no military action against one another. Finally, with an entente alliance there is an understanding that the states would consult with one another if a crisis occurred

The plots below show the number of alliances by alliance type. The top row shows the number of new alliances that were formed each year and the second row shows number of alliances that were terminated that year. Please note that if an alliance was formed between 4 states then there would 6 new alliances in the data set because there is an alliance between each of the 4 members. Similarly, if an alliance between 4 states were terminated that would be 6 less alliances.

```{r fig.width=8, fig.height=4}

dir_al_0 <- filter(dir_al, dir_al$dyad_st_year %in% c(1900:2012))
all_st <- gather(dir_al_0, treaty_type, idicator, defense:entente)
all_st <- all_st[!all_st$idicator %in%  0,]
all_st$dyad_end_year[all_st$dyad_end_year %in% NA] = 2016
al_st_count <- all_st[, c(3,5,8,11,15)]
al_st_count$count <- 1 
gp_st <- aggregate(cbind(count) ~ dyad_st_year+treaty_type, data =al_st_count, FUN = sum )
a <- ggplot() +
  xlab("Year") +
  ylab("Count")+
  geom_bar(data = al_st_count, aes(x = dyad_st_year)) +
  ggtitle("Total Alliances by Year they started")+
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(450), fill=Conflict),alpha=0.2)+
  ggtitle("Number of Alliances Formed") +
  theme_classic()+
  theme(legend.position="bottom", plot.title = element_text(hjust = .5)) + 
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3")) + facet_wrap(~treaty_type, nrow = 1 )
b<- ggplot() +
  xlab("Year") +
  ylab("Count")+
  geom_bar(data = al_st_count[al_st_count$dyad_end_year < 2016, ], aes(x = dyad_end_year)) +
  ggtitle("Number of Alliances Terminated")+
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(450), fill=Conflict),alpha=0.2)+
  theme_classic()+
  theme(legend.position="bottom", plot.title = element_text(hjust = .5)) + 
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3")) + facet_wrap(~treaty_type, nrow = 1)
grid.arrange(a,b, nrow =2)

Most of the alliances formed were at the end of WWII and during the Cold War. Also, the most frequently formed alliances were defense and entente. Surprisingly, in years that see a large increase in the number of alliances formed there is also an increase in the number of alliances that were terminated. To dig in further to get a better understand of what types of treaties were formed and why they ended, we looked at individual countries.

The following charts are all organized the same way, they show a timeline of when the alliances started till either the end of the alliance (shown in red) or till 2012 if the alliance was observed in effect as of December 31, 2012 (shown in blue). The charts are facetted to show the different types of alliances because many of the alliance types overlap. For example, one alliance could be both a defense and entente alliance, so to get a better visual representation we separated the types of alliances. We focused this part of the analysis on the United States, United Kingdom, Russia, and Germany because these countries are major powers and thus are involved in many of the military alliances throughout history.

United States of America

```{r fig.width=7.5, fig.height = 5}

al_us_yr <- filter(dir_al_year, dir_al_year$state_name1 %in% "United States of America")
al_us_yr <- gather(al_us_yr, Treaty, idicator, defense:entente)   
al_us_yr <- al_us_yr[!al_us_yr$idicator %in%  0,]
al_us_yr$dyad_end_year[al_us_yr$dyad_end_year %in% NA] = 2016
al_us_yr <- al_us_yr[, c(5,8,11,14,16)]
al_us_yr$count = 1 
al_us_yr$Status <- ""
al_us_yr$Status[al_us_yr$dyad_end_year < 2012] <- "Ended"
al_us_yr$Status[al_us_yr$dyad_end_year== 2012] <- "Ongoing"
ggplot() + 
  xlab("Year") +
  ylab("Country")+
  geom_point(data =al_us_yr, aes(x=year, y = state_name2, color =Status), alpha = .5) + 
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin="Afghanistan", ymax="Zimbabwe", fill=Conflict),alpha=0.15) + 
  ggtitle("US Alliances ")+
  theme_classic()+
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3")) + facet_wrap(~treaty_type, nrow = 1)+
  facet_wrap(~Treaty, nrow = 1 ) +
  theme(legend.position="right", plot.title = element_text(hjust = .5)) 

Looking at the alliances for the US, we see that most defense alliances are still in effect today. There were handful of alliances with South American countries that ended towards the end of WWII but the US entered into a different alliance with those same countries immediately. The treaties in effect with the South American countries is the Inter-American Treaty of Reciprocal Assistance (Rio Pact) where if there is an attack against one country, it is considered an attack among all the Americas countries in the alliance. This alliance was created n 1949 and continues till today.

You can also see a similar pattern of ongoing alliances for defense entente and nonaggression treaty types. NATO, a defensive, entente and nonaggression alliance, was formed in 1947 and is still in effect till today. NATO involves 28 countries and accounts for the high number of alliances formed in 1949 for the 3 types.

The entente alliances follow a similar pattern where the alliance ended and was immediately reformed. There were a few countries where there was an entente alliance formed towards the end of the Korean War and ended a few years after the end of the Vietnam war. The majority of the countries that follow the described patters are in Asia or Australia. This is reasonable considering they were participants is the Vietnam War. Also the defense and entente alliance between the US and Cuba ended during the Vietnam War, indicated in the graph above, when Cuba was providing military support to the Vietnamese. Also, during the Vietnam war, there was a neutrality alliance for a few years between the US and countries that participated in the Vietnam war. This alliance was called the International Agreement on the Neutrality of Laos starting in 1961 and was terminaded when was Democratic Republic of Vietnam violated the terms of the treaty 2 years later.

United Kingdom

dir_al_year2 <- filter(dir_al_year, dir_al_year$dyad_st_year %in% c(1900:2016))
al_gb_yr <- filter(dir_al_year2, dir_al_year2$state_name1 %in% c("United Kingdom"))
al_gb_yr <- gather(al_gb_yr, Treaty, idicator, defense:entente)   
al_gb_yr <- al_gb_yr[!al_gb_yr$idicator %in%  0,]
al_gb_yr <- al_gb_yr[, c(3,5,8,11,14,16)]
al_gb_yr$count = 1 
al_gb_yr$Status[al_gb_yr$dyad_end_year < 2012] <- "Ended"
al_gb_yr$Status[al_gb_yr$dyad_end_year== 2012] <- "Ongoing"
ggplot() + 
  xlab("Year") +
  ylab("Country")+
  geom_point(data =al_us_yr, aes(x=year, y = state_name2, color =Status), alpha = .5) + 
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin="Afghanistan", ymax="Zimbabwe", fill=Conflict),alpha=0.15) + 
  ggtitle("United Kingdom Alliances")+
  theme_classic()+
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3")) + facet_wrap(~treaty_type, nrow = 1)+
  facet_wrap(~Treaty, nrow = 1 ) +
  theme(legend.position="right", plot.title = element_text(hjust = .5)) 

The United Kingdom follows the same pattern as the United States. Most of the defense alliances started towards the end of WWII and are still in effect today. The United Kingdom is part of NATO so we see a high number of defense, entente and non-aggression treaty types formed in 1949 and continue till today. The UK was also part of the Neutrality of Laos alliance and we can see that in the neutrality plot.

Germany

al_gmy_yr <- filter(dir_al_year2, dir_al_year2$state_name1 %in% c("Germany"))
al_gmy_yr <- gather(al_gmy_yr, Treaty, idicator, defense:entente)   
al_gmy_yr <- al_gmy_yr[!al_gmy_yr$idicator %in%  0,]
al_gmy_yr <- al_gmy_yr[, c(5,8,11,14,16)]
al_gmy_yr$count = 1 
al_gmy_yr$Status[al_gmy_yr$dyad_end_year < 2012] <- "Ended"
al_gmy_yr$Status[al_gmy_yr$dyad_end_year== 2012] <- "Ongoing"
ggplot() + 
  xlab("Year") +
  ylab("Country")+
  geom_point(data =al_gmy_yr, aes(x=year, y = state_name2, color =Status), alpha = .5) + 
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin="Afghanistan", ymax="Zimbabwe", fill=Conflict),alpha=0.15) + 
  ggtitle("Germany Alliances")+
  theme_classic()+
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3")) + facet_wrap(~treaty_type, nrow = 1)+
  facet_wrap(~Treaty, nrow = 1 ) +
  theme(legend.position="right", plot.title = element_text(hjust = .5)) 

As Germany was heavily impacted by WWII, we see all its alliances end by the end of the war. During its recovery period, Germany was not a part of any military alliance. This is to be as expected since Germany was so devastated by the war it could not afford to maintain the military. Germany was also going through a period of civil unrest. In 1990, Germany joined NATO after East and West Germany combined to from a unified Germany.

Russia

al_rus_yr <- filter(dir_al_year2, dir_al_year2$state_name1 %in% c("Russia"))
al_rus_yr <- gather(al_rus_yr, Treaty, idicator, defense:entente)   
al_rus_yr <- al_rus_yr[!al_rus_yr$idicator %in%  0,]
al_rus_yr <- al_rus_yr[, c(5,8,11,14,16)]
al_rus_yr$count = 1 
al_rus_yr$Status[al_rus_yr$dyad_end_year < 2012] <- "Ended"
al_rus_yr$Status[al_rus_yr$dyad_end_year== 2012] <- "Ongoing"
ggplot() + 
  xlab("Year") +
  ylab("Country")+
  geom_point(data =al_rus_yr, aes(x=year, y = state_name2, color =Status), alpha = .5) + 
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin="Afghanistan", ymax="Zimbabwe", fill=Conflict),alpha=0.15) + 
  ggtitle("Russia Alliances")+
  theme_classic()+
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3")) + facet_wrap(~treaty_type, nrow = 1)+
  facet_wrap(~Treaty, nrow = 1 ) +
  theme(legend.position="right", plot.title = element_text(hjust = .5)) 

China

al_china_yr <- filter(dir_al_year2, dir_al_year2$state_name1 %in% c("China"))
al_china_yr <- gather(al_china_yr, Treaty, idicator, defense:entente)   
al_china_yr <- al_china_yr[!al_china_yr$idicator %in%  0,]
al_china_yr <- al_china_yr[, c(5,8,11,14,16)]
al_china_yr$count = 1 
al_china_yr$Status[al_china_yr$dyad_end_year < 2012] <- "Ended"
al_china_yr$Status[al_china_yr$dyad_end_year== 2012] <- "Ongoing"
ggplot() + 
  xlab("Year") +
  ylab("Country")+
  geom_point(data =al_rus_yr, aes(x=year, y = state_name2, color =Status), alpha = .5) + 
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin="Afghanistan", ymax="Zimbabwe", fill=Conflict),alpha=0.15) + 
  ggtitle("China Alliances")+
  theme_classic()+
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3")) + facet_wrap(~treaty_type, nrow = 1)+
  facet_wrap(~Treaty, nrow = 1 ) +
  theme(legend.position="right", plot.title = element_text(hjust = .5)) 

Conclusions

Since we are mainly focusing on large-scale wars that involved various countries, many treaties were created and broken. For example, Warsaw Pact was created as a counter weight to the NATO Pact created at end of WWII. The US, Great Britain and their allies became part of NATO and the Soviet Union and its Allies became part of the Warsaw pact. Once the USSR dissolved many of the satellite nations, the Warsaw Pact members joined NATO. Such military. With NATO and the formation of the United Nations, it is hard to say which countries will participate in the next war. For example, before the beginning of the War on Afghanistan, the security council had to authorize the United States and NATO allies to organize an offensive against al-Qaeda. This type of regulation makes it hard to determine how future wars will play out. One thing that was interesting is that once a treaty falls apart, the members try to join another treaty which is why we see spikes is the median number of alliances towards the end of the wars.

One of the things that was hard to work with this data set is that it was impossible to tell which alliances were part of a larger treaty. For example, if there was a data point for an alliance between the US and the UK in 1967, there was no indication of if it was NATO or some other treaty. This also made it hard to tell when a country joined an existing alliance. For example, when Germany joined NATO there were data points for an alliance between Germany and the NATO members but it is not easy to discern that Germany join NATO without some internet research.

The other downfall of this data set is that it only considers formal military alliances. It will not take into account other types of alliances such as the United Nations & security council or a trade agreement. For example, Japan is not in any military alliance currently but it does have very close ties to the United States today, and this information is not captured in the data set.

Trade

Militarized Interstate Disputes

Conclusion

Discuss limitations and future directions, lessons learned

Sources

---
title: "Tale of War"
output: html_notebook
author: "Cynthia Clement & Vineet Aguiar"
---
```{r, echo = FALSE}

## Load Packages 

library(ggplot2)
library(grid)
library(gridExtra)
library(tidyr)
library(dplyr)
library(viridis)
library(gtools)
library(RColorBrewer)


##import data

#NMC

#NMC_orig = read.csv("/Users/cynthiaclement/EDAV_Project_2017/data/NMC_5_0/NMC_5_0.csv", sep= ",")
NMC_orig = read.csv("./data/NMC_5_0/NMC_5_0.csv", sep= ",")

#Alliances 
dyad_al_0= read.csv("./data/version4.1_csv/alliance_v4.1_by_dyad.csv", sep= ",")
dyad_al_year = read.csv("./data/version4.1_csv/alliance_v4.1_by_dyad_yearly.csv", sep= ",")

member_alliances = read.csv("./data/version4.1_csv/alliance_v4.1_by_member.csv", sep= ",")
member_al_year = read.csv("./data/version4.1_csv/alliance_v4.1_by_member_yearly.csv", sep= ",")

dir_al_year= read.csv("./data/version4.1_csv/alliance_v4.1_by_directed_yearly.csv", sep= ",")
dir_al= read.csv("./data/version4.1_csv/alliance_v4.1_by_directed.csv", sep= ",")

## Events Data Frame 

d=data.frame(x1=c(1914,1939, 1947, 1950, 1955, 2001), x2=c(1918, 1945, 1991, 1953, 1975, 2010), Conflict=c("WWI", "WWII", "Cold War", "Korean War", "Vietnam War", "Afghanistan War"), r=c(1,2,3,4,5,6))

mylables <- c("No War" , "WWI", "WWII", "Cold War", "Korean War", "Vietnam War", "Afghanistan War")
```


##Introduction

**Overview**  
Wars are complex events born of geopolitical, cultural or economic strife, oftentimes spanning many years but ultimately costing us lives, our livelihood and peace. During wars, countries quickly adopt ideologies, form allegiances, and discipline their economic and scientific priorities while maintaining their military focus with a blind adherence. Although the causes of this displacement of peace may vary, is there a precursory pattern to it? Does the landscape change after the end of a prolonged conflict? Do certain actors benefit more? Do some lose more than others? And most importantly, could there be important predictors of these epic events that change the course of history? 

**Our Aim**  
We are particularly interested in studying the changes that happen to a country before and after they enter a war. We want to see the change of alliances and strategies, it's impact on trade and commerce and the economics at play. We also want to compare and contrast the characteristics of countries who won wars with the ones that lost. Our eventual goal is to find certain factors that indicate which countries will enter into a war and how these factors/predictors change over time. 

**Scope and Timeline**  
To limit our scope we will explore the data with a particular emphasis on the United States of America and the wars it has fought since 1900. At various points, we may have to include comparisons between countries and the US and we will explore the data breadth-wise to draw meaningful insights.  

For our timeline, we plan to look at events/activity leading to, during and following the major US wars, namely;  
WWI    -------------------------- 1914-1918  
WWII  ------------------------- 1939-1945  
Cold War -------------------- 1947-1991  
Korean War ---------------- 1950-1953  
Vietnam War --------------- 1955-1975  
War in Afghanistan ------- 2001-2010 


## The Data, Team Members and The Roles

The [Correlates of War Project](http://www.correlatesofwar.org) is a treasure trove of information. We have a special interest in the following datasets: Trade, National Materials Capabilities, Alliances and Militarized Interstate Disputes. 

**Our Plan**  
We've decided to divide and conquer the work by each taking a subset of the data and exploring it. After sometime, we will regroup to see what we've learnt so far and switch data sets amongst ourselves to see if there are more insights to be learnt or different approaches to visualize the existing data. Lastly, we want to drill down into particular variables and plot correlations or predictors for the final output.

*Phase 1:*  
* Cynthia to analyze National Materials Capabilities and Alliances  
* Vineet to analyze Militarized Interstate Disputes and Trade  

*Phase 2:*  
* We are going to switch the data sets we are looking at to see if the other person can discern any new insights or creative ways of presenting the data.  
  + Cynthia to analyze Militarized Interstate Disputes and Trade  
  + Vineet to analyze National Materials Capabilities and Alliances   

*Phase 3:*
* Cynthia and Vineet to come together and look at the interaction of different variables. For example how did a change in trade impact the NMC. 


## Analysis of Data Quality 

**Provide a detailed, well-organized description of data quality, including textual description, graphs, and code. **

The Correlates of War datasets are a product of the Correlate of War Project (COW) founded in 1963. COW's goal is to "facilitate the collection, dissemination, and use of accurate and reliable quantitative data in international relations." [add source] From the COW datasets we focused on 4 datasets: NMC, MID, Alliances and Trade. 

**Overall**

*in this section we can talk about Consistency, conformity and integrity*


Overall the consistency of these data sets is quite good. We have not found any evidence of conflicting information.



*for each individual data set address accuracy, completeness, and dulpication*

**National Materials Capabilities**
  
The overall data quality of NMC dataset is very good. There are roughly 14,000 entries and 89% of them did not have any missing values. There is one data entry for each country per year. The accuracy of the data is also very good because as countries are dissolved and new ones are formed, this data keeps track of them. For example, the graph depicts the CINC of Austria-Hungary from 1900-1918, the end of WWI when it the Austro-Hungarian Empire was dissolved. Immediately after that, you see data points for Austria and Hungary separately.This same accuracy hold true for many different coutries, where there is only data once the country has declared independence or has just been created. 

```{r fig.width=10, fig.height=5}
NMC_test <- NMC_orig
NMC_test$cinc[NMC_test$cinc == -9] <- NA
NMC_test$irst[NMC_test$irst == -9] <- NA
NMC_test$milex[NMC_test$milex == -9] <- NA
NMC_test$milper[NMC_test$milper== -9] <- NA
NMC_test$pec[NMC_test$pec == -9] <- NA
NMC_test$tpop[NMC_test$tpop == -9] <- NA
NMC_test$upop[NMC_test$upop == -9] <- NA

NMC_test <- filter(NMC_test, NMC_test$year %in% c(1900:2007))
test <- c("Austria-Hungary", "Austria", "Hungary")
test_ccode <- member_alliances$ccode[match(test, member_alliances$state_name)]

NMC_test <- filter(NMC_test, NMC_test$ccode %in% test_ccode)

ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="CINC") +
  labs(color ='Country Abbreviation')+
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=.09, fill=Conflict),alpha=0.15) +
  geom_line(data = NMC_test, aes(x = year, y = cinc, color = stateabb, group = stateabb)) +
  ggtitle("CINC for Austria-Hungary") + 
  theme_classic()+
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5), legend.position = "bottom") 

```
  
**Alliances**

The overall data quality of the data set was very good. The information in the alliances data matched exactly with historical facts

**Graph of NATO Alliance and when the members joined Members**


version id == 227

Some of the inconsistencies that I noticed are in cases where the alliance is still in effect as of the 12/31/2012, which was when this data set was last updated. In some of the datasets, if the alliance is ongoing, it would have the dyad_end_year, the field that represents the year in which the alliance was terminated, set to 2012 and in other data sets it would have it set as ‘NA’. In the case that dyad_end_year is set to 2012, it was hard to know if there were any alliances that ended in 2012 or if they were an ongoing alliance. 


3) US had only one alliance before 1925 and Japan has no more alliances now

```{r fig.width=10, fig.height=8}
qa <- filter(dir_al_year, dir_al_year$state_name1 %in% c("United States of America"))
qa <- qa[, c(3,5,8,11,18)]
qa$dyad_end_year[qa$dyad_end_year %in% NA] = 2016
#qa <- gather(qa, st_ed, year, dyad_st_year:dyad_end_year )

ggplot() + 
  geom_point(data =qa, aes(x=year, y = state_name2), alpha = .5) +
  xlab("Year") +
  ylab("Country")+
  ggtitle("")+
  theme_classic() 

```
  
  
**Trade** 
   
     
**Militarized Interstate Disputes** 



*Below for reference only - this is what we want to talk to in this section*



Completeness: Is all the requisite information available? Are data values missing, or in an unusable state? In some cases, missing data is irrelevant, but when the information that is missing is critical to a specific business process, completeness becomes an issue. 

Conformity: Are there expectations that data values conform to specified formats? If so, do all the values conform to those formats? Maintaining conformance to specific formats is important in data representation, presentation, aggregate reporting, search, and establishing key relationships.

Consistency: Do distinct data instances provide conflicting information about the same underlying data object? Are values consistent across data sets? Do interdependent attributes always appropriately reflect their expected consistency? Inconsistency between data values plagues organizations attempting to reconcile between different systems and applications. 

Accuracy: Do data objects accurately represent the “real-world” values they are expected to model? Incorrect spellings of product or person names, addresses, and even untimely or not current data can impact operational and analytical applications. 

Duplication: Are there multiple, unnecessary representations of the same data objects within your data set? The inability to maintain a single representation for each entity across your systems poses numerous vulnerabilities and risks. 

Integrity: What data is missing important relationship linkages? The inability to link related records together may actually introduce duplication across your systems. Not only that, as more value is derived from analyzing connectivity and relationships, the inability to link related data instance together impedes this valuable analysis. 

Based on the data how strong are your observations   
  
## Executive Summary 
  
**Provide a short nontechnical summary of the most revealing findings of your analysis with no more than 3 static graphs or one interactive graph (or link), written for a nontechnical audience. The length should be approximately 2 pages (if we were using pages...) Do not show code, and take extra care to clean up your graphs, ensuring that best practices for presentation are followed.**  


**National Materials Capabilities**
```{r, echo = FALSE}
countries <- c("Germany", "Japan", "China")
country_code <- member_alliances$ccode[match(countries, member_alliances$state_name)]

NMC_execsum <- filter(NMC_orig, NMC_orig$year %in% all_year)
NMC_execsum <- filter(NMC_execsum, NMC_execsum$ccode %in% country_code)
NMC_execsum$country = ""
for( i in c(1:length(country_code))){
  NMC_execsum$country[NMC_execsum$ccode == country_code[i]] <- countries[i]
}

NMC_execsum_ratios <- filter(NMC_ratios, NMC_ratios$year %in% all_year)
NMC_execsum_ratios <- filter(NMC_ratios, NMC_ratios$ccode %in% country_code)
NMC_execsum_ratios$country = ""
for( i in c(1:length(country_code))){
  NMC_execsum_ratios$country[NMC_execsum_ratios$ccode == country_code[i]] <- countries[i]
}


mex_r_es<- ggplot() + 
  scale_x_continuous(name="") + 
  scale_y_continuous(name="Military Expenditures Ratio") +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_execsum_ratios$milex), fill=Conflict),alpha=0.1) +
  geom_line(data = NMC_execsum_ratios, aes(x = year, y = milex, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  theme(plot.title = element_text(hjust = .5),legend.position="none")

mip_r_es<- ggplot() + 
  scale_x_continuous(name="") + 
  scale_y_continuous(name="Military Personnel Ratio") +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_execsum_ratios$milper), fill=Conflict),alpha=0.1) +
  geom_line(data = NMC_execsum_ratios, aes(x = year, y = milper, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  theme(plot.title = element_text(hjust = .5),legend.position="none")


top_r_es<- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Total Population Ratio") +
  labs(color ='Country')+
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_execsum_ratios$tpop), fill=Conflict),alpha=0.1) +
  geom_line(data = NMC_execsum_ratios, aes(x = year, y = tpop, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  theme(plot.title = element_text(hjust = .5),legend.position="bottom",legend.key.size = unit(1, "cm"), legend.title=element_text(size=10) , legend.text=element_text(size=10))
  


g_legend<-function(a.gplot){
  tmp <- ggplot_gtable(ggplot_build(a.gplot))
  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
  legend <- tmp$grobs[[leg]]
  return(legend)}

mylegend<-g_legend(top_r_es)


#grid_arrange_shared_legend(ias, ias_r, mex, mex_r, mip, mip_r, urp, urp_r, top, top_r, nrow = 6)


```



```{r  fig.width=15, fig.height=6 , echo = FALSE}

grid.arrange(arrangeGrob(mex_r_es,mip_r_es + theme(plot.title = element_text(hjust = .5), legend.position="none"),nrow=1, ncol =2),mylegend, heights=c(10,1))

```

**Alliances**
  
  
**Trade** 
   
     
**Militarized Interstate Disputes** 
  
  
## Main Analysis 

**Provide a detailed, well-organized description of your findings, including textual description, graphs, and code.  Your focus should be on both the results and the process. Include, as reasonable and relevant, approaches that didn't work, challenges, the data cleaning process, etc.**

####National Materials Capabilities

```{r}
NMC <- NMC_orig 
NMC$cinc[NMC$cinc == -9| is.na(NMC$cinc)] <- 0
NMC$irst[NMC$irst == -9| is.na(NMC$irst)] <- 0
NMC$milex[NMC$milex == -9| is.na(NMC$milex)] <- 0
NMC$milper[NMC$milper== -9| is.na(NMC$milper)] <- 0
NMC$pec[NMC$pec == -9| is.na(NMC$pec)] <- 0
NMC$tpop[NMC$tpop == -9| is.na(NMC$tpop)] <- 0
NMC$upop[NMC$upop == -9| is.na(NMC$upop)] <- 0

all_year <- c(1900:2007)

NMC_ratios <- c("")
for (year_t in all_year){
  yr <- filter(NMC, NMC$year %in% year_t) 
  max <- apply(yr[, c(4:9)], 2, sum)
  #max <- as.numeric(max[4:9])
  for (i in 4:9){
    yr[,i] = as.numeric(yr[,i]/max[i-3])
  }
  NMC_ratios <- smartbind(NMC_ratios , yr)
}

NMC_ratios  <- NMC_ratios [c(2:nrow(NMC_ratios )), c(2:length(NMC_ratios))]


cbbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")

```


National Materials Capability measures the power of a country based on 6 values: total population, urban population, military personnel, military expenditures, iron and steel production and energy consumption. NMC is purely a measure of military and economic means of influence rather than diplomacy or other forms of influence. 

CINC is the composite score to measure the power of a country using the average of the ratios, calculated as described below. 
 
![] (./cinc_calc.png)


Below is the CNIC score for major powers today who also participated the major wars in the past. Also highlighted are the years wars mentioned above. 



#```{r fig.width=5, fig.height=2.5}
```{r fig.width=10, fig.height=5}
countries <- c("United States of America", "United Kingdom", "France", "Russia","Germany", "Japan", "China")
country_code <- member_alliances$ccode[match(countries, member_alliances$state_name)]

NMC_mp <- filter(NMC, NMC$year %in% all_year)
NMC_mp <- filter(NMC_mp, NMC_mp$ccode %in% country_code)
NMC_mp$country = ""
for( i in c(1:length(country_code))){
  NMC_mp$country[NMC_mp$ccode == country_code[i]] <- countries[i]
}

ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="CINC") +
  labs(color ='Country')+
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp$cinc), fill=Conflict),alpha=0.15) +
  geom_line(data = NMC_mp, aes(x = year, y = cinc, color = country, group = stateabb)) +
  ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic()+
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5), legend.position = "bottom") 
  



```


As you can see CNIC is constantly changing. Interestingly, the CNIC spikes for the US right after WWI and WWII where the US had a major role in the outcome of the wars. There is also an increase in the US’s CINC score during the Korean War. Following the Korean War, the US’s CINC score shows a continuous decrease during the Vietnam where the US lost the war.  At the end of WWI, you see Russia’s CNIC dip low but it bounced back to its CNIC before WWI pretty quickly. During the Vietnam War, Russia supported the Vietnamese people and you see the its CINC score increase above that of the US as they start to gain ground in the war. Also Russia’s CINC score drops low towards the end of the Cold War as the satellite states start gaining their independence and the USSR was dissolved.

To explore the trends more, the data was refined to only look at the major participants of each war. For example, with WWI we looked at the CINC values for major players in the Allied Powers and Central Powers a few years before and after the war. We replicated this process for all the events listed above. Below are CINC graphs using this approach for WWI and WWII.


```{r fig.width=10, fig.height=5, Echo = TRUE}
allied <- c("United States of America", "United Kingdom", "Russia", "Japan", "Italy")
allied_ccode <- member_alliances$ccode[match(allied, member_alliances$state_name)]

central <- c("Germany", "Turkey", "Austria-Hungary", "Romania", "Bulgaria")
central_ccode <- member_alliances$ccode[match(central, member_alliances$state_name)]

WWI_range = c(1904:1930)

WWI<- filter(NMC, NMC$year %in% WWI_range)
alliedP<- filter(WWI, WWI$ccode %in% allied_ccode)
alliedP$side = "Allied Powers"
for( i in c(1:length(allied_ccode))){
  alliedP$country[alliedP$ccode == allied_ccode[i]] <- paste(allied[i],"(Allied)", sep = " ")
}

centralP<- filter(WWI, WWI$ccode %in% central_ccode)
centralP$side = "Central Powers "
for( i in c(1:length(central_ccode))){
  centralP$country[centralP$ccode == central_ccode[i]] <- paste(central[i], "(Central)", sep = " ")
}
WWI <-rbind(alliedP, centralP)

ww1 <- ggplot() + 
  labs(color ='Country')+
  xlab("Year") +
  ylab("CNIC")+
  geom_rect(data=d, mapping=aes(xmin=1914, xmax=1918, ymin=0, ymax=.4),alpha=0.05, fill ="salmon") +
  geom_line(data = WWI, aes(x = year, y = cinc, color = country, group = country)) + 
  facet_wrap(~side) + 
  theme_classic()+
  scale_color_brewer(palette="Paired")+
  ggtitle("CNIC Score: WWI Major Players ")+
   theme(plot.title = element_text(hjust = .5),legend.position="right")


allies <- c("United States of America", "United Kingdom", "France", "Russia", "Australia","China")
allies_ccode <- member_alliances$ccode[match(allies, member_alliances$state_name)]

axis <- c("Germany", "Italy", "Japan", "Hungary", "Romania", "Bulgaria")
axis_ccode <- member_alliances$ccode[match(axis, member_alliances$state_name)]

WWII_range = c(1934:1950)

WWII<- filter(NMC, NMC$year %in% WWII_range)
alliedP2<- filter(WWII, WWII$ccode %in% allies_ccode)
alliedP2$side = "Allies"
for( i in c(1:length(allies_ccode))){
  alliedP2$country[alliedP2$ccode == allies_ccode[i]] <- paste(allies[i], "(Allies)", sep = " ")
}

axisP<- filter(WWII, WWII$ccode %in% axis_ccode)
axisP$side = "Axis"
for( i in c(1:length(axis_ccode))){
  axisP$country[axisP$ccode == axis_ccode[i]] <- paste(axis[i], "(Axis)", sep = " ")
}

WWII <-rbind(alliedP2, axisP)

ww2 <- ggplot() + 
  labs(color ='Country')+
  xlab("Year") +
  ylab("CNIC")+
  geom_rect(data=d, mapping=aes(xmin=1939, xmax=1945, ymin=0, ymax=.4),alpha=0.05, fill ="paleturquoise3") +
  geom_line(data = WWII, aes(x = year, y = cinc, color = country, group = stateabb)) + 
  facet_wrap(~side) + 
  theme_classic()+
  scale_color_brewer(palette="Paired")+
  ggtitle("CNIC Score: WWII Major Players ")+
  theme(plot.title = element_text(hjust = .5),legend.position="right")

```

#```{r fig.width=5, fig.height=4}
```{r fig.width=10, fig.height=8}
grid.arrange(ww1, ww2, nrow=2)

```
As mentioned, with WWI the US’s CINC spiked right after the war and then began to steadily decrease till the beginning of the WWII. Russia’s CINC dropped but rose again pretty quickly and stayed on a relatively upward trend till WWII. Unlike the US and Russia, United Kingdom’s CINC was steadily decreasing after the war. Italy and Japan’s CINC remained steady. With the Central Powers after WWI, Germany’s CINC dropped but it did not rise again. Turkey’s, Romania’s and Bulgaria’s CINC remained steady. We see the Austria-Hungarian CINC disappear after the war since the Austro-Hungarian empire was dissolved at the end of the war.

With WWII, we see a similar pattern for the US where the CNIC reaches a peak at the end of WWII and steadily decreases till the Korean war. Russia also shows a similar pattern to WWI where its CINC score reaches a low point towards the end of WWII and then continues to steadily increase till the Korean war. The UK also follows a similar pattern where it CINC peaks right after the war and then steadily decreases throughout the Cold War time period. With the Axis powers, Germany’s and Japan’s CINC drops off.

To explore the patters above, we looked into the components that make up the CINC. We chose to focus on the major powers because they had the most drastic changes during this time period. We looked at both the actual value and the ratio because the absolute values gradually increased over time but the ratios show performance relative to the other countries each year. Looking at the ratios helped us to see trends that were not easy to spot when looking at the overall values. Below is a plot of five of the six CINC components, the values and the ratios




```{r fig.width=25, fig.height=20 , echo=FALSE}

mex <- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Military Expenditures") +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp$milex), fill=Conflict),alpha=0.1) +
  geom_line(data = NMC_mp, aes(x = year, y = milex, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5),legend.position="none")


mip<- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Military Personnel ") +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp$milper), fill=Conflict),alpha=0.1) +
  geom_line(data = NMC_mp, aes(x = year, y = milper, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5),legend.position="none")


nrg<- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Primary Energy Consumption") +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp$pec), fill=Conflict),alpha=0.1) +
  geom_line(data = NMC_mp, aes(x = year, y = pec, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5),legend.position="none")


ias<-ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Iron and Steel Production") +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp$irst), fill=Conflict),alpha=0.1) +
  geom_line(data = NMC_mp, aes(x = year, y = irst, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5),legend.position="none")


urp<- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Urban Population") +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp$upop), fill=Conflict),alpha=0.1) +
  geom_line(data = NMC_mp, aes(x = year, y = upop, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5),legend.position="none")

top<-ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Total Population") +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp$tpop), fill=Conflict),alpha=0.15) +
  geom_line(data = NMC_mp, aes(x = year, y = tpop, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5),legend.position="none")


#grid.arrange(ias, mex, mip, nrg, urp, top, nrow = 3)


NMC_mp_ratios <- filter(NMC_ratios, NMC_ratios$year %in% all_year)
NMC_mp_ratios <- filter(NMC_ratios, NMC_ratios$ccode %in% country_code)
NMC_mp_ratios$country = ""
for( i in c(1:length(country_code))){
  NMC_mp_ratios$country[NMC_mp_ratios$ccode == country_code[i]] <- countries[i]
}


mex_r<- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Military Expenditures Ratio") +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp_ratios$milex), fill=Conflict),alpha=0.1) +
  geom_line(data = NMC_mp_ratios, aes(x = year, y = milex, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5),legend.position="none")

mip_r<- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Military Personnel Ratio") +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp_ratios$milper), fill=Conflict),alpha=0.1) +
  geom_line(data = NMC_mp_ratios, aes(x = year, y = milper, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5),legend.position="none")


nrg_r<- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Primary Energy Consumption Ratio") +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp_ratios$pec), fill=Conflict),alpha=0.1) +
  geom_line(data = NMC_mp_ratios, aes(x = year, y = pec, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5),legend.position="none")


ias_r<- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Iron and Steel Production Ratio") +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp_ratios$irst), fill=Conflict),alpha=0.1) +
  geom_line(data = NMC_mp_ratios, aes(x = year, y = irst, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5),legend.position="none")


urp_r<- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Urban Population Ratio") +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp_ratios$upop), fill=Conflict),alpha=0.1) +
  geom_line(data = NMC_mp_ratios, aes(x = year, y = upop, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5),legend.position="none")



top_r<- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Total Population Ratio") +
  labs(color ='Country')+
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp_ratios$tpop), fill=Conflict),alpha=0.1) +
  geom_line(data = NMC_mp_ratios, aes(x = year, y = tpop, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5),legend.position="bottom",legend.key.size = unit(1, "cm"), legend.title=element_text(size=10) , legend.text=element_text(size=10))
  


g_legend<-function(a.gplot){
  tmp <- ggplot_gtable(ggplot_build(a.gplot))
  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
  legend <- tmp$grobs[[leg]]
  return(legend)}

mylegend<-g_legend(top_r)


#grid_arrange_shared_legend(ias, ias_r, mex, mex_r, mip, mip_r, urp, urp_r, top, top_r, nrow = 6)


```



#```{r  fig.width=8, fig.height=7 }
```{r fig.width=16, fig.height=14}

grid.arrange(arrangeGrob(ias, ias_r, mex, mex_r, mip, mip_r + theme(plot.title = element_text(hjust = .5), legend.position="none"),nrow=3, ncol =2),mylegend, heights=c(10,1))

#grid.arrange(ias, ias_r, mex, mex_r, mip, mip_r, urp, urp_r, top, top_r, nrow=6)
```


With the Iron and Steel Production ratio, you can see that it follows to same pattern as the CINC for the US during this time period. The ratio peaks towards the end of WWI and WWII and decreases in the time period between the two wars. Till about the beginning of the Vietnam War, roughly 1955, the US dominated the world in Iron and Steel production and so this value had huge impact the US’s CINC score. Generally, during most wars, the US has the most military production but lost its position toward s the end of the Vietnam war when Russia surpassed the US. The United Kingdom maintained its iron and steel production, but since USA and Russia were increasing their production, the UKs ratio has been steadily decreasing since the 1900s, very similar to the pattern observed with the CINC score. 

Looking at military expenditure we see that the US and Russia had been significantly investing more in the military throughout the cold war. On the other hand, both countries decrease their military personnel after they peaked at the end of WWII. These findings are consistent with the Cold War where the US and Russia were in an arms race where they heavily invested in military technology but did not engage in any large scale battles. The US’s military expenditures ratio peaks the same year its CINC score and iron and steel production ratio.

In the few years before WWII, you see Germany’s military expenditures ratio increase quite rapidly and the military personnel ratio saw a drastic increase in the one year before increasing quite rapidly. Although not as drastic, Japan and China follow a similar pattern where the military investment increased significantly a few years prior to WWII and the Korean War, respectively, and the military personnel ratio drastically increased right before the wars. This suggests that in years before the wars, these countries started investing in and preparing their militaries for war. The countries that were on the reactive side, the US, Russia and the UK, their military production ratios and military expenditures ratios only increased during the war. 



##```{r  fig.width=3, fig.height=1}
```{r fig.width=10, fig.height=3}
NMC_v<- filter(NMC_ratios, NMC_ratios$year %in% c(1970:2000))
NMC_ratios_v<- filter(NMC_ratios, NMC_ratios$year %in% c(1970:2000))
#names <- c("North Korea", "South Korea", "Afghanistan"  , "Vietnam", "Republic of Vietnam")
#code<- c(731, 732,700, 816, 817)
names <- c("Iraq")
code <- c(645)
NMC_v<- filter(NMC_v, NMC_v$ccode %in% code)
NMC_ratios_v<- filter(NMC_ratios_v, NMC_ratios_v$ccode %in% code)
for( i in c(1:length(code))){
  NMC_ratios_v$country[NMC_ratios_v$ccode == code[i] ]<- names[i]
  NMC_v$country[NMC_v$ccode == code[i] ]<- names[i]
}


a<- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Ratio") + 
  geom_line(data = NMC_ratios_v, aes(x = year, y = milex), colour = "indianred3") +
  ggtitle("Iraq Military Expenditures Ratio") + 
  geom_rect(data=d, mapping=aes(xmin=1990, xmax=1991, ymin=0, ymax=.025),alpha=0.03) +
  theme_classic() +
  theme(plot.title = element_text(hjust = .5, size =10 ),   axis.text.x =element_text(size =6), axis.text.y =element_text(size =6), axis.title.x =element_text(size =8), axis.title.y =element_text(size =8), legend.position="none")

b<- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Ratio") +
  geom_line(data = NMC_ratios_v, aes(x = year, y = milper), colour= "royalblue3") +
  ggtitle("Iraq Military Personnel Ratio") + 
  geom_rect(data=d, mapping=aes(xmin=1990, xmax=1991, ymin=0, ymax=.05), alpha=0.03) +
  theme_classic() +
  theme(plot.title = element_text(hjust = .5, size =10 ),   axis.text.x =element_text(size =6), axis.text.y =element_text(size =6), axis.title.x =element_text(size =8), axis.title.y =element_text(size =8), legend.position="none")

grid.arrange(a,b,nrow = 1)
```

We wanted to see if this this same pattern was present in other conflicts. During the Gulf War, Iraq invaded Kuwait and it was met with international condemnation and the US and other nations joined forces to stop Iraq. But before the invasion in war we see Iraq’s military expenditures and personnel ratios increasing. The gray shaded box indicates the time period of the Gulf War (1990-1991). In the 1980s, Iraq’s military expenditure ratio drastically increased and then there was a sudden spike in military personnel right before the start of the war.  


#```{r fig.width=7.5, fig.height= 3}
```{r fig.width=14, fig.height=6}
ct <- c("United States of America", "United Kingdom", "France", "Russia","Germany", "Italy", "Japan", "China")
ct_ccode <- member_alliances$ccode[match(ct, member_alliances$state_name)]

NMC_ct <- filter(NMC, NMC$year %in% all_year)
NMC_ct <- filter(NMC_ct, NMC_ct$ccode %in% ct_ccode)
for( i in c(1:length(ct_ccode))){
  NMC_ct$country[NMC_ct$ccode == ct_ccode[i]] <- ct[i]
}

a<- ggplot(NMC_ct, aes(country, year, fill = cinc)) + geom_tile()+
  xlab("Country") +
  ylab("Year") +
  scale_fill_viridis() + 
  ggtitle("CINC Heatmap for Major Powers") + 
  theme_classic()+
  theme(plot.title = element_text(hjust = 0.5))+ 
  coord_flip() +
  labs(color ='CINC')


topCas <- c("Netherlands", "Yugoslavia", "Lithuania", "Poland", "Austria", "Hungary", "Romania", "Estonia", "Luxembourg")
topCas_ccode <- member_alliances$ccode[match(topCas, member_alliances$state_name)]


NMC_cas <- filter(NMC, NMC$year %in% all_year)
NMC_cas <- filter(NMC_cas, NMC_cas$ccode %in% topCas_ccode)
for( i in c(1:length(topCas_ccode))){
  NMC_cas$country[NMC_cas$ccode == topCas_ccode[i]] <- topCas[i]
}

b<- ggplot(NMC_cas, aes(country, year, fill = cinc)) + geom_tile()+
  xlab("Country") +
  ylab("Year") +
  scale_fill_viridis() + 
  ggtitle("CINC Heatmap for Countries with the most Holocaust Casualties") + 
  theme_classic()+
  theme(plot.title = element_text(hjust = 0.5)) +
  coord_flip() +
  labs(color ='CINC')
  

grid.arrange(a,b, nrow = 1 )
```

Finally, using the heat maps, we looked at impact on CINC after a war. On the left is a heat map of the major powers. You can see that France, Germany and Japan had missing CINC values at the end of World War II. Those periods of missing values correspond to the recovery period for each of those countries following the war. Next, we looked to see what other coutries had gaps in their CINC data. Most of the countries that have missing gaps are Eastern European countries that where heaviliy impacted by the Holocaust and by the struggle to end Communism in the region. This indiactes that for countries that are going through intense destruction or reformation, they do not have any CINC information.



**Next Steps ** 
To get even more into detail and context around certain characteristics, we want to look at specific events like pearl harbor or China’s invasion of Malaysia and other pivotal moments in wars to see how those events impact the CINC and its various components.  

A problem with NMC is that there are many other factors that determine a power of a nation rather than the 6 NMC factors. One of the major considerations that is not taken into account are diplomatic relations and dealings. Diplomatic relations play a major role is the prevention and conclusion of conflicts. With this data, it was not possible to factor that in. 

Additionally, another thing to consider is differences in policies between different countries. We see that military expenditures have been increasing for the US since the Cold War but Russia’s military expenditures take a sudden drop at the end of the Cold War. Since then Russia has been cutting military spending till today. Even with its participation in the Afghanistan War, Russia’s military expenditures have not increase. On the contrary, in the US today, politicians are proposing a Federal Budgets with increases in military spending. This difference is due to differences in policies of the countries. Thus the reactions of countries to events will drastically vary based on their policies and it because hard to distinguish an overall pattern. 

Another drawback of NMC is that it cannot take into account changes in universal priorities. For example, with an increased concern for climate change and scare natural resources and also with advancements in technology, iron and steel production might start to decrease drastically in the future so it may no longer be a valid measure of power.  Similarly, advancements in technoloy would decrease the need for military personeel. The issue with NMC is that it cannot take such policy concerns and changes into consideration to measure national power. 




####Alliances


```{r fig.width=14, fig.height=8}

dayd_al_year <- filter(dyad_al_year, dyad_al_year$year %in% c(1900:2012))
dyad_al_year$length = dyad_al_year$dyad_end_year - dyad_al_year$dyad_st_year

dayd_al_year$conflict <- "0"
dyad_al_year$count <- 1 

dir_alliances <- gather(dir_al_year, treaty_type, idicator, defense:entente)
dir_alliances <- dir_alliances[!dir_alliances$idicator %in%  0,]
dir_alliances$dyad_end_year[dir_alliances$dyad_end_year %in% NA] = 2016
dir_alliances <- dir_alliances[dir_alliances$year>1900,]
alliance_count <- dir_alliances[, c(2,3,14,16)]
alliance_count$count <- 1 

gp_ct <- aggregate(cbind(count) ~ ccode1+state_name1+year+treaty_type, data =alliance_count, FUN = sum )
gp_ct$Conflict <- "0"

for(i in c(1:length(d$x1))){
  gp_ct[gp_ct$year >= d$x1[i] & gp_ct$year <= d$x2[i], length(gp_ct)] <- as.character(i)
}

```


WWI was triggered by the assassination of the Archduke Franz Ferdinand of Austria. His death set off diplomatic crisis as countries that were not involved in the original conflict were forced to get involved. Once Austria declared war on Serbia for the death of the Arch Duke, Russia had to step into defend Serbia. Once Russia entered the conflict, Germany was forced to enter the conflict due to its alliance with Austria. During the conflict Germany invaded Belgium; in response, the United Kingdom mobilized due to their alliance with Belgium. This pattern continued to eventually involve all the major powers of the world for a devastating battle. Such alliances were the cause of World War I. Since then the number of Alliances has only grown and continues to grow. For this reason, we wanted to look at alliances and see how they change during wars. 

Below is boxplot of the total number of alliances that are in effect each year between any two countries. It is easy to see that the median number of alliances jumped up significantly during WWII and continued to grow during the Cold War and remained relatively level since then. An interesting pattern is that the median number of alliances increased more in the 1-3 before the end of war. You can see this patter with WWI, Korean War, Vietnam War and the end of the Cold War. Although the Cold War was only a state of severe political war there were many regional battles and the threat of a large-scale military war was constant. The number of alliances significantly increased from the start of the Cold War till the end.  


#```{r fig.width=10, fig.height=5}
```{r fig.width=20, fig.height=8}
ggplot() +
  xlab("Year") +
  ylab("Count")+
  geom_boxplot(data = gp_ct, aes(x = as.factor(year), y = count, fill = Conflict)) +
  ggtitle("Total Alliances by Year") + 
  theme_classic()+
  theme(plot.title = element_text(hjust = .5, size = 20), axis.text.x = element_text(angle = 90, size = 10),  axis.text.y = element_text(size = 15),  axis.title.y = element_text(size = 15), axis.title.x = element_text(size = 15),  legend.position="bottom", legend.text = element_text(size=15),  legend.title = element_text(size=15))+
  scale_fill_manual(values=c("white", "lightsteelblue3", "pink3", "paleturquoise3", "lightsteelblue2", "lightsteelblue4","salmon" ), labels = mylables)

```
Next we looked at the types of alliances formed during this time. The COW data reports on 4 types of alliances: defense, neutrality, entente and non-aggression. In a defense alliance, the member states agree to defend one or more states in alliance in the event of a conflict. With a neutrality alliance, there is an agreement to maintain neutrality towards the members of the alliance. In non-aggression alliance, the members agree to take no military action against one another. Finally, with an entente alliance there is an understanding that the states would consult with one another if a crisis occurred 

The plots below show the number of alliances by alliance type. The top row shows the number of new alliances that were formed each year and the second row shows number of alliances that were terminated that year. Please note that if an alliance was formed between 4 states then there would 6 new alliances in the data set because there is an alliance between each of the 4 members. Similarly, if an alliance between 4 states were terminated that would be 6 less alliances.


#```{r fig.width=8, fig.height=4}
```{r fig.width=12, fig.height=8}
dir_al_0 <- filter(dir_al, dir_al$dyad_st_year %in% c(1900:2012))
all_st <- gather(dir_al_0, treaty_type, idicator, defense:entente)
all_st <- all_st[!all_st$idicator %in%  0,]
all_st$dyad_end_year[all_st$dyad_end_year %in% NA] = 2016
al_st_count <- all_st[, c(3,5,8,11,15)]
al_st_count$count <- 1 

gp_st <- aggregate(cbind(count) ~ dyad_st_year+treaty_type, data =al_st_count, FUN = sum )

a <- ggplot() +
  xlab("Year") +
  ylab("Count")+
  geom_bar(data = al_st_count, aes(x = dyad_st_year)) +
  ggtitle("Total Alliances by Year they started")+
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(450), fill=Conflict),alpha=0.2)+
  ggtitle("Number of Alliances Formed") +
  theme_classic()+
  theme(legend.position="bottom", plot.title = element_text(hjust = .5)) + 
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3")) + facet_wrap(~treaty_type, nrow = 1 )

b<- ggplot() +
  xlab("Year") +
  ylab("Count")+
  geom_bar(data = al_st_count[al_st_count$dyad_end_year < 2016, ], aes(x = dyad_end_year)) +
  ggtitle("Number of Alliances Terminated")+
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(450), fill=Conflict),alpha=0.2)+
  theme_classic()+
  theme(legend.position="bottom", plot.title = element_text(hjust = .5)) + 
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3")) + facet_wrap(~treaty_type, nrow = 1)

grid.arrange(a,b, nrow =2)


```

Most of the alliances formed were at the end of WWII and during the Cold War. Also, the most frequently formed alliances were defense and entente. Surprisingly, in years that see a large increase in the number of alliances formed there is also an increase in the number of alliances that were terminated. To dig in further to get a better understand of what types of treaties were formed and why they ended, we looked at individual countries. 

The following charts are all organized the same way, they show a timeline of when the alliances started till either the end of the alliance (shown in red) or till 2012 if the alliance was observed in effect as of December 31, 2012 (shown in blue). The charts are facetted to show the different types of alliances because many of the alliance types overlap. For example, one alliance could be both a defense and entente alliance, so to get a better visual representation we separated the types of alliances. We focused this part of the analysis on the United States, United Kingdom, Russia, and Germany because these countries are major powers and thus are involved in many of the military alliances throughout history. 
 
 


**United States of America ** 

#```{r fig.width=7.5, fig.height =  5}
```{r fig.width=12, fig.height=9}
al_us_yr <- filter(dir_al_year, dir_al_year$state_name1 %in% "United States of America")
al_us_yr <- gather(al_us_yr, Treaty, idicator, defense:entente)   
al_us_yr <- al_us_yr[!al_us_yr$idicator %in%  0,]
al_us_yr$dyad_end_year[al_us_yr$dyad_end_year %in% NA] = 2016
al_us_yr <- al_us_yr[, c(5,8,11,14,16)]
al_us_yr$count = 1 
al_us_yr$Status <- ""
al_us_yr$Status[al_us_yr$dyad_end_year < 2012] <- "Ended"
al_us_yr$Status[al_us_yr$dyad_end_year== 2012] <- "Ongoing"

ggplot() + 
  xlab("Year") +
  ylab("Country")+
  geom_point(data =al_us_yr, aes(x=year, y = state_name2, color =Status), alpha = .5) + 
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin="Afghanistan", ymax="Zimbabwe", fill=Conflict),alpha=0.15) + 
  ggtitle("US Alliances ")+
  theme_classic()+
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3")) + facet_wrap(~treaty_type, nrow = 1)+
  facet_wrap(~Treaty, nrow = 1 ) +
  theme(legend.position="right", plot.title = element_text(hjust = .5)) 

```

Looking at the alliances for the US, we see that most defense alliances are still in effect today. There were handful of alliances with South American countries that ended towards the end of WWII but the US entered into a different alliance with those same countries immediately. The treaties in effect with the South American countries is the Inter-American Treaty of Reciprocal Assistance (Rio Pact) where if there is an attack against one country, it is considered an attack among all the Americas countries in the alliance. This alliance was created n 1949 and continues till today. 

You can also see a similar pattern of ongoing alliances for defense entente and nonaggression treaty types. NATO, a defensive, entente and nonaggression alliance, was formed in 1947 and is still in effect till today. NATO involves 28 countries and accounts for the high number of alliances formed in 1949 for the 3 types. 

The entente alliances follow a similar pattern where the alliance ended and was immediately reformed. There were a few countries where there was an entente alliance formed towards the end of the Korean War and ended a few years after the end of the Vietnam war. The majority of the countries that follow the described patters are in Asia or Australia. This is reasonable considering they were participants is the Vietnam War. Also the defense and entente alliance between the US and Cuba ended during the Vietnam War, indicated in the graph above, when Cuba was providing military support to the Vietnamese. Also, during the Vietnam war, there was a neutrality alliance for a few years between the US and countries that participated in the Vietnam war. This alliance was called the International Agreement on the Neutrality of Laos starting in 1961 and was terminaded when was Democratic Republic of Vietnam violated the terms of the treaty 2 years later.  

**United Kingdom ** 


```{r fig.width=12, fig.height=9}
dir_al_year2 <- filter(dir_al_year, dir_al_year$dyad_st_year %in% c(1900:2016))
al_gb_yr <- filter(dir_al_year2, dir_al_year2$state_name1 %in% c("United Kingdom"))
al_gb_yr <- gather(al_gb_yr, Treaty, idicator, defense:entente)   
al_gb_yr <- al_gb_yr[!al_gb_yr$idicator %in%  0,]

al_gb_yr <- al_gb_yr[, c(3,5,8,11,14,16)]
al_gb_yr$count = 1 
al_gb_yr$Status[al_gb_yr$dyad_end_year < 2012] <- "Ended"
al_gb_yr$Status[al_gb_yr$dyad_end_year== 2012] <- "Ongoing"

ggplot() + 
  xlab("Year") +
  ylab("Country")+
  geom_point(data =al_us_yr, aes(x=year, y = state_name2, color =Status), alpha = .5) + 
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin="Afghanistan", ymax="Zimbabwe", fill=Conflict),alpha=0.15) + 
  ggtitle("United Kingdom Alliances")+
  theme_classic()+
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3")) + facet_wrap(~treaty_type, nrow = 1)+
  facet_wrap(~Treaty, nrow = 1 ) +
  theme(legend.position="right", plot.title = element_text(hjust = .5)) 

```


The United Kingdom follows the same pattern as the United States. Most of the defense alliances started towards the end of WWII and are still in effect today. The United Kingdom is part of NATO so we see a high number of defense, entente and non-aggression treaty types formed in 1949 and continue till today. The UK was also part of the Neutrality of Laos alliance and we can see that in the neutrality plot. 


**Germany ** 


```{r fig.width=12, fig.height=6}
al_gmy_yr <- filter(dir_al_year2, dir_al_year2$state_name1 %in% c("Germany"))
al_gmy_yr <- gather(al_gmy_yr, Treaty, idicator, defense:entente)   
al_gmy_yr <- al_gmy_yr[!al_gmy_yr$idicator %in%  0,]

al_gmy_yr <- al_gmy_yr[, c(5,8,11,14,16)]
al_gmy_yr$count = 1 
al_gmy_yr$Status[al_gmy_yr$dyad_end_year < 2012] <- "Ended"
al_gmy_yr$Status[al_gmy_yr$dyad_end_year== 2012] <- "Ongoing"

ggplot() + 
  xlab("Year") +
  ylab("Country")+
  geom_point(data =al_gmy_yr, aes(x=year, y = state_name2, color =Status), alpha = .5) + 
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin="Afghanistan", ymax="Zimbabwe", fill=Conflict),alpha=0.15) + 
  ggtitle("Germany Alliances")+
  theme_classic()+
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3")) + facet_wrap(~treaty_type, nrow = 1)+
  facet_wrap(~Treaty, nrow = 1 ) +
  theme(legend.position="right", plot.title = element_text(hjust = .5)) 


```
As Germany was heavily impacted by WWII, we see all its alliances end by the end of the war. During its recovery period, Germany was not a part of any military alliance. This is to be as expected since Germany was so devastated by the war it could not afford to maintain the military. Germany was also going through a period of civil unrest. In 1990, Germany joined NATO after East and West Germany combined to from a unified Germany. 



**Russia ** 


```{r fig.width=12, fig.height=6}
al_rus_yr <- filter(dir_al_year2, dir_al_year2$state_name1 %in% c("Russia"))
al_rus_yr <- gather(al_rus_yr, Treaty, idicator, defense:entente)   
al_rus_yr <- al_rus_yr[!al_rus_yr$idicator %in%  0,]

al_rus_yr <- al_rus_yr[, c(5,8,11,14,16)]
al_rus_yr$count = 1 
al_rus_yr$Status[al_rus_yr$dyad_end_year < 2012] <- "Ended"
al_rus_yr$Status[al_rus_yr$dyad_end_year== 2012] <- "Ongoing"

ggplot() + 
  xlab("Year") +
  ylab("Country")+
  geom_point(data =al_rus_yr, aes(x=year, y = state_name2, color =Status), alpha = .5) + 
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin="Afghanistan", ymax="Zimbabwe", fill=Conflict),alpha=0.15) + 
  ggtitle("Russia Alliances")+
  theme_classic()+
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3")) + facet_wrap(~treaty_type, nrow = 1)+
  facet_wrap(~Treaty, nrow = 1 ) +
  theme(legend.position="right", plot.title = element_text(hjust = .5)) 

```

**China ** 



```{r fig.width=12, fig.height=6}
al_china_yr <- filter(dir_al_year2, dir_al_year2$state_name1 %in% c("China"))
al_china_yr <- gather(al_china_yr, Treaty, idicator, defense:entente)   
al_china_yr <- al_china_yr[!al_china_yr$idicator %in%  0,]

al_china_yr <- al_china_yr[, c(5,8,11,14,16)]
al_china_yr$count = 1 
al_china_yr$Status[al_china_yr$dyad_end_year < 2012] <- "Ended"
al_china_yr$Status[al_china_yr$dyad_end_year== 2012] <- "Ongoing"

ggplot() + 
  xlab("Year") +
  ylab("Country")+
  geom_point(data =al_rus_yr, aes(x=year, y = state_name2, color =Status), alpha = .5) + 
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin="Afghanistan", ymax="Zimbabwe", fill=Conflict),alpha=0.15) + 
  ggtitle("China Alliances")+
  theme_classic()+
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3")) + facet_wrap(~treaty_type, nrow = 1)+
  facet_wrap(~Treaty, nrow = 1 ) +
  theme(legend.position="right", plot.title = element_text(hjust = .5)) 
```


**Conclusions**

Since we are mainly focusing on large-scale wars that involved various countries, many treaties were created and broken. For example, Warsaw Pact was created as a counter weight to the NATO Pact created at end of WWII. The US, Great Britain and their allies became part of NATO and the Soviet Union and its Allies became part of the Warsaw pact. Once the USSR dissolved many of the satellite nations, the Warsaw Pact members joined NATO. Such military. With NATO and the formation of the United Nations, it is hard to say which countries will participate in the next war. For example, before the beginning of the War on Afghanistan, the security council had to authorize the United States and NATO allies to organize an offensive against al-Qaeda. This type of regulation makes it hard to determine how future wars will play out. One thing that was interesting is that once a treaty falls apart, the members try to join another treaty which is why we see spikes is the median number of alliances towards the end of the wars. 

One of the things that was hard to work with this data set is that it was impossible to tell which alliances were part of a larger treaty. For example, if there was a data point for an alliance between the US and the UK in 1967, there was no indication of if it was NATO or some other treaty.  This also made it hard to tell when a country joined an existing alliance. For example, when Germany joined NATO there were data points for an alliance between Germany and the NATO members but it is not easy to discern that Germany join NATO without some internet research. 

The other downfall of this data set is that it only considers formal military alliances. It will not take into account other types of alliances such as the United Nations & security council or a trade agreement. For example, Japan is not in any military alliance currently but it does have very close ties to the United States today, and this information is not captured in the data set. 

  
**Trade** 
   
     
**Militarized Interstate Disputes** 
  
  
## Conclusion 
**Discuss limitations and future directions, lessons learned**

## Sources 

